Structural Return Maximization for Reinforcement Learning

机译：强化学习的结构回归最大化

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Batch Reinforcement Learning (RL) algorithms attempt to choose a policy froma designer-provided class of policies given a fixed set of training data.Choosing the policy which maximizes an estimate of return often leads toover-fitting when only limited data is available, due to the size of the policyclass in relation to the amount of data available. In this work, we focus onlearning policy classes that are appropriately sized to the amount of dataavailable. We accomplish this by using the principle of Structural RiskMinimization, from Statistical Learning Theory, which uses Rademachercomplexity to identify a policy class that maximizes a bound on the return ofthe best policy in the chosen policy class, given the available data. Unlikesimilar batch RL approaches, our bound on return requires only extremely weakassumptions on the true system.

机译：批量强化学习（RL）算法尝试在给定固定训练数据集的情况下，从设计人员提供的策略类别中选择策略。如果仅提供有限的数据，则选择最大化回报估计的策略通常会导致过度拟合与可用数据量相关的策略类的大小。在这项工作中，我们专注于学习大小适合可用数据量的策略类。我们通过使用统计学习理论中的结构化风险最小化原理来实现这一目标，该原理使用Rademachercomplexity来确定一个策略类别，该策略类别在给定可用数据的情况下最大化所选策略类别中最佳策略的收益范围。与类似的批处理RL方法不同，我们的收益界限仅需对真实系统进行非常弱的假设。

著录项

作者
Joseph, Joshua; Velez, Javier; Roy, Nicholas;
展开▼
作者单位

展开▼
年度 2014
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. An information-theoretic analysis of return maximization in reinforcement learning. [J] . Iwata K Neural Networks: The Official Journal of the International Neural Network Society . 2011,第10期

机译：强化学习中收益最大化的信息理论分析。
2. The asymptotic equipartition property in reinforcement learning and its relation to return maximization. [J] . Iwata K, Ikeda K, Sakai H Neural Networks: The Official Journal of the International Neural Network Society . 2006,第1期

机译：强化学习中的渐近等分性质及其与收益最大化的关系。
3. A Role of the Asymptotic Equipartition Property in Return Maximization of Reinforcement Learning [J] . Kazunori Iwata, Hideaki Sakai, Kazushi Ikeda, 電子情報通信学会技術研究報告. ニュ-ロコンピュ-ティング. Neurocomputing . 2005,第759期

机译：渐近均分性质在强化学习收益最大化中的作用
4. Learning to Maximize Return in a Stag Hunt Collaborative Scenario through Deep Reinforcement Learning [C] . Andrei Cristian Nica, Tudor Berariu, Florin Gogianu, International Symposium on Symbolic and Numeric Algorithms for Scientific Computing . 2017

机译：通过深度强化学习来学习在雄鹿狩猎协作方案中获得最大回报
5. Inferring Structural Models of Travel Behavior: An Inverse Reinforcement Learning Approach [D] . Feygin, Sidney. 2018

机译：推断出行行为的结构模型：反强化学习方法
6. How much of reinforcement learning is working memory not reinforcement learning? A behavioral computational and neurogenetic analysis [O] . Anne G. E. Collins, Michael J. Frank -1

机译：钢筋学习多少是工作记忆而不是加强学习？行为计算和神经肝分析
7. Maximizing network throughput by cooperative reinforcement learning in clustered solar-powered wireless sensor networks [O] . Yujia Ge, Yurong Nan, Xianhai Guo 2021

机译：集群太阳能无线传感器网络中的合作加固学习最大化网络吞吐量

Structural Return Maximization for Reinforcement Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅